Immunoglobulin Variable Region Libraries

ABSTRACT

Antigen-specific immunoglobulin V-regions are identified from a library of nucleic acids amplified using polymerase chain reaction using leader sequence-specific forward primers. The use of leader sequence primers allows all V-region sequences to be amplified (including those with extensive 5′ end mutations) without loss of the original 5′ V gene segment sequence. These libraries can be screened for antigen-specific V-regions using eukaryotic cells engineered to express the amplified V-region-encoding nucleic acids or using bacterial phage display techniques. In the latter, a second V-region library is made using a larger than conventional set of 5′ V-region primers. The sequence errors introduced into the amplification products by this method are corrected using sequence information obtained in the products amplified by the V-region primers to screen the library created using the leader sequence primers. Amino acid sequence information from fragments of donor immunoglobulins can be used to assist in the identification of nucleic acids encoding the heavy and light chains of donor antibodies as well as to design primers to amplify such nucleic acids.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. nonprovisional patentapplication Ser. No. 15/584,360 filed on May 2, 2017, which is acontinuation of U.S. nonprovisional patent application Ser. No.15/248,413 filed on Aug. 26, 2016 (now U.S. Pat. No. 9,670,483), whichis a continuation of U.S. nonprovisional patent application Ser. No.14/939,454 filed on Nov. 12, 2015 (now U.S. Pat. No. 9,453,217), whichis a continuation of U.S. nonprovisional patent application Ser. No.13/547,497 filed on Jul. 12, 2012 (now U.S. Pat. No. 9,234,030) whichclaims priority from U.S. provisional patent application No. 61/506,749filed on Jul. 12, 2011.

SEQUENCE LISTING

The instant application contains a Sequence Listing which has beensubmitted in ASCII format via EFS-Web and is hereby incorporated byreference in its entirety. Said ASCII copy, created on Jul. 12, 2012, isnamed 54070120.txt and is 40,626 bytes in size.

FIELD OF THE INVENTION

The invention relates generally to the fields of immunology andmolecular biology. More particularly, the invention relates to methods,compositions, and kits for identifying “affinity-matured” humanantibodies (Abs) containing somatic hypermutations.

BACKGROUND

Throughout the 19^(th) and 20^(th) centuries, scientists and physiciansdreamt of being able to custom design therapeutic agents that werehighly specific for a single biological target. By selectively attackingdisease while sparing healthy tissue, these “magic bullets” were thoughtto be ideal therapeutic agents. It was not until the early 1970's,however, that this dream began to be realized when Kohler and Milstein(Nature 256: 495, 1975) developed a ground-breaking method for makingantigen (Ag)-specific monoclonal antibodies (mAbs). This was perhaps oneof the most significant events in the history in modern immunology.Using this new approach, numerous research, diagnostic, and therapeuticproducts were developed with a combined market of hundreds of billionsof dollars.

In Kohler and Milstein's method, rodent B cells which each express aunique immunoglobulin (Ig) (i.e., a mAb) were immortalized by fusionwith a myeloma cell. The ability of these mAb-secreting “hybridomas” toreplicate ad infinitum allowed those that produced a mAb with highaffinity for a given Ag to be selected as an engine for producing largequantities of identical mAbs. MAbs made by this method eventually wereused as human therapeutics. While effective, the use of therapeuticrodent mAbs was soon discovered to be less than ideal because, aftermultiple administrations, they induced anti-rodent Ig immune responses,which neutralized the activity of the mAb and frequently caused seriousadverse reactions in treated patients.

Attempts were therefore made to adapt Kohler and Milstein's rodentmethod to make human mAbs by fusing human B cells to myeloma cells. This“human hybridoma” technique proved to be more challenging than expectedfor a number of different reasons. First, for ethical or medicalreasons, it was often difficult to increase the frequency of Ag-specificB cells by repeatedly immunizing human subjects with an Ag. Second, alsofor ethical or medical reasons, it was not practical to isolate humanspleen tissue to obtain B cells as was done in rodents. Third,immortalized mAb-producing human cell lines were not easy to establishand tended to produce very little mAb. And fourth, because the humanimmunoglobulin diversity, or repertoire, of individual Ab specificitiesis much higher than the rodent repertoire (˜10¹² vs. ˜10⁷ individualspecificities), and because the screening methodology employed waslimited to no more than about 10⁴ specificities in a given sampling, itis much more technically difficult to isolate human mAb-secreting cellsof a desired specificity. Identifying mAbs having low frequencyspecificities by this technique thus remains a daunting, “finding aneedle in the haystack” effort.

In parallel with the development of “human hybridoma” methods,scientists also began attempting to “humanize” rodent antibodies byreplacing various rodent amino acid sequences with those from humanantibodies. In a typical application, murine Ig variable regions arecombined with human constant domains, or murine complementaritydetermining regions (CDRs) are grafted onto a human Ig framework. Thisnew recombinant nucleic acid construct is inserted into an expressionvector which is transfected into host cells that can produce largequantities of the chimeric mAb when cultured. While this technique hasdone much to improve the usefulness of mAbs in therapeutic applications(by reducing the human anti-rodent-immunoglobulin response), theremaining rodent sequences can still cause the development of anundesirable anti-mAb response [e.g., a HAMA (human anti-mouse antibody)response] that can neutralize the mAb or even cause a serious adversereaction in a patient.

Another approach for producing human mAbs has been to use chimericanimals wherein the host animal's Ig genes have been replaced, in part,with their human counterpart. Upon vaccination, these knockout/knockinanimals produce human Abs and their B cell-containing spleens can beused to make human mAbs using the conventional hybridoma technique.Shortcomings that have been associated with this technology include:reduced diversity (specificity) and affinity due to an incompleterepertoire of human Ig-related genes; humoral responses that are notpurely human (“humanized” transgenic animals retain their native T cellsand have “leaky” expression of murine antibody responses); theantibodies produced have non-human glycosylation patterns (e.g., Galα1-3Gal containing oligosacarides in the humanized animals are notcommonly found in humans and are thus recognized as foreign); andantibodies that undergo development in an animal are not “selected” fortolerance in a human—thus even though derived from partial humansequences, they are not necessarily non-immunogenic in humans (i.e., nottrue human).

The advent of polymerase chain reaction (PCR) methods made it possibleto amplify entire or parts of Ig genes, and allowed the creation oflibraries of plasmids containing cDNAs encoding heavy and light Ig chainvariable regions (V-regions) from peripheral blood leukocytes (PBLs) tobe produced. Traditionally, library generation for phage displayinvolved a set of 6 primers in the variable region of heavy chain, 4primers in the variable region of kappa chain, and 9 primers in thevariable region of lambda chain. (as described in Barbas et al., PhageDisplay: A Laboratory Manual. Cold Spring Harbor, 2001.) The plasmidsfrom these libraries can be used to transfect bacteria or other hostcells (e.g., yeast) which can then be screened for Ag-specific Igcomponents. In the phage display approach, human Ig V-region genes arecloned into bacteriophage in order to display Ab fragments on thesurfaces of bacteriophage particles. Genes from extensive human Ab genelibraries are inserted into a library of phage. Each phage canpotentially carry the gene for a different V-region and displays adifferent V-region on its surface. Commonly these phage are used todisplay on a bacteriophage surface an Ab construct made up of heavy andlight Ig chain V-regions separated by a linker (a single chain fragmentvariable Ab or scFv), such that each heavy and light chain V-regions canbind a target antigen in a manner that closely mimics how a mAb bindsAg. Similar to phage display, in yeast display, Ig components aredisplayed as a fusion to the Aga2p protein on the surface of yeast in amanner that projects the fusion protein away from the cell surface. Flowcytometry can then be used to select those yeast cells expressing an Igcomponent that reacts with a target Ag. Once identified by phage oryeast display, the expressed heavy and/or light chain V-regions genesare recovered and inserted into a full-length heavy and/or light chainexpression vectors. Culturing host cells transfected with such vectorsallows large quantities of a mAb to be produced.

The primary benefit of phage or yeast display techniques is that hugelibraries, incorporating as many as 100 billion distinct V-regions, maybe screened for Ag reactivity. A drawback of these techniques is thatthe V-regions isolated in the first round of display are typically ofvery low affinity. A common reason for generating low affinitiesantibodies is the result of the loss of crucial hypermutated sequences(which create the mature high affinity antibodies) using this method. Blymphocytes that produce high affinity antibodies have generallyundergone a process known as somatic hypermutation. In this process,germline encoded V-region sequences undergo genetic mutations, such thatthe gene sequence of the V-region is altered. This can result in theproduction of a “mature,” V-region sequence that has substantiallyhigher-affinity for a given antigen. It is generally understood thatsomatic hypermutation is a fundamentally important event for generatingantibody diversity and high affinity antibodies. This somatichypermutation cannot be found in the germline genomic sequences, and isrepresented only in a unique B lymphocyte clone.

The generation of low affinity antibodies using the phage displayapproach has been a major limitation to this method. This difficulty hasindeed resulted in the development of many other complex approaches,including humanized mice and in vitro somatic hypermutation, asdescribed above. Antibodies in humans undergo a selection process thatmakes them tolerated in the human body. These techniques result in theuse of antibody sequences that are not based on human in vivo selectionprocesses, and thus these antibodies can be immunogenic in humans. Inany event, overcoming the production of low-affinity antibodies has beena major challenge for these methods.

A reason why phage display commonly generates low affinity antibodies isthat it involves the use of primer sequences that are based on germlineV-region sequences. There exist two possible outcomes from usingV-region primers based on germline sequences: (1) the primers will failto hybridize with somatically mutated sequences, since the mismatchbetween the primer and hypermutated sequences fails to result in primerannealing; or, (2) the mismatch between the primer and hypermutatedsequences is conservative, such that annealing does occur, but theamplification results in recapitulation of the germline-sequence encodedby the primer rather than the underlying hypermutated sequence. Ineither event, since the phage display methodology involves crucial PCRamplification steps using PCR primer sequences that encode germlineV-region sequences, and since these primers amplify the Ig sequencesfrom inside the antibody V-regions, cDNA libraries generated usinggermline primer sequences will not contain the full repertoire ofsomatic hypermutated sequences. Furthermore, and importantly, therepertoire will be completely devoid of antibodies with hypermutation inthe 5′ end of the V-region, i.e., the region encoded by the germlineencoded primer.

In other words, using primers based on germline sequences recapitulatesthe germline V-region sequences, thereby losing hypermutations in thearea encoded by the primer sequence. This method thus systematicallyeliminates hypermutation in these areas of the V-region and results inan overall loss of hypermutated antibody sequences. In the case of lowfrequency Abs (i.e., less than 0.1, 0.01, 0.001, 0.0001, or 0.00001% ofa subject's entire repertoire of Ag specificities), for which the phagedisplay system is arguably most suited, finding V-region genes that canbe used to make a high affinity mAb becomes all the more challenging.

SUMMARY

The invention is based on the development of an improved method ofidentifying Ag-specific Ig V-regions that occur in low frequencies inhuman Ab repertoires such that these V-regions can be used to makeAg-specific human mAbs or other Ag-targeting constructs. The inventionalso allows the identification of more complete V-region libraries,particularly those that encode antibody sequences that contain crucialsomatic mutations for generating high affinity antibodies.

The development of this improved method was based on the surprisingdiscovery that traditional phage and yeast display methods allow many ofthe V-regions in human peripheral blood mononuclear cells (PBMCs) thathave high affinity for a target Ag to escape identification. Intraditional phage and yeast display methods, PCR is used to amplifynucleic acid sequences corresponding to the V-regions of heavy and/orlight Ig chains. The forward primers that initiate the PCR amplificationprocess have a nucleic acid sequence that is complementary to the 5′germline end of each V-region so that cDNAs encoding the V-regions willbe amplified.

Due to the several processes responsible for the generation of Abdiversity, the amino acid sequences of V-regions of mature Abs differfrom germline sequences and from each other. Ig gene segments in mammalsare arranged in groups of variable (V), diversity (D), joining (J), andconstant (C) exons. The DNA for the human heavy chain includes about 50functional VH segments, 30 DH segments, and 6 JH segments. The kappachain includes about 40 functional Vκ segments, five Jκ segments, and 1Cκ segment. The lambda chain DNA contains 30 Vλ segments, and 4 each ofJλ and Cλ segments. Junctional diversity results from the imprecisejoining of gene segments, and during recombination, non-templatenucleotides may be added between adjoining gene segments by terminaldeoxynucleotidyl transferase, and become a part of the complementaritydetermining region (CDR). The assembled V-D-J or V-J segments then getdiversified by stepwise incorporation of single nucleotide substitutions(somatic hypermutation) leading to greater antibody diversity andaffinity maturation.

The three CDR regions on each V-region heavy and light chain show themost sequence diversity. Because of this diversity, these regions arealso called hypervariable regions. The six CDRs of the heavy and lightchains are crucial to determining the Ag-binding specificity of an Ab.Because the 5′ germline end of each V-region is not within ahypervariable region, the conventional method of using a forward primersdirected to germline sequences at this site was not believed to beproblematic. And indeed, for identifying those V-regions in highfrequency human Abs, this method has proved to be very useful.

In work leading up to the invention, attempts were made to identify lowfrequency Abs. Although Abs of the desired specificity could beidentified in peripheral blood samples using techniques such as ELISA(enzyme-linked immunosorbent assay), when these samples were used inconventional phage display methods, no high affinity Abs could begenerated. This is a general problem in the industry and complex invitro methods of making mutations to create combinatorial libraries ofV-regions have been developed. These methods are difficult, laborious,and do not necessarily produce antibodies that maintain native humansequences.

This invention stems from troubleshooting the reasons for the failure ofthe phage system to generate high affinity antibodies for these lowfrequency antibodies. While considering their previous experimentsshowing that (a) there is a considerable rate of mutation away fromgermline in the amino acid sequence in the 5′ end of V-regions and (b)two amino acid substitutions in the sections of V-regions outside theCDRs had a significant effect on an Ab's affinity for Ag, there was thestartling realization that the conventional method of amplifyingV-region sequences using primers for 5′ germline end sequences resultedin libraries that were missing many of those V-regions that havemutations at their 5′ end, and also introduced sequence errors in thisarea during the PCR process. Even under stringent PCR conditions, thegermline sequence primers cross-hybridize with V-regions that containsome number of somatic mutations. Thus the primers, when amplifying theV-region, recapitulate the germline sequence in the cDNA. The somatichypermutations of the true high-affinity V-region will thus be lost.Importantly, it was also realized that it was those missing V-regionsthat would be expected to be included in the highest affinity Abs. Thefirst ten amino acids in a V-region can and often do contain somatichypermutations. A single amino acid change in this region can affectantibody binding.

In the innovative approach described herein, rather than amplifyingV-regions using PCR primers for 5′ V-region sequences, V-regions areamplified using leader sequence-specific forward primers. Because theleader sequence is just upstream of the V-region genes, all V-regionsequences, including those with 5′ end mutations, are amplified. Theresulting library therefore contains a more complete V-region repertoirethan those prepared by the conventional method (which results in theloss of all V-region sequences that are somatically hypermutated). Thusisolating low frequency V-regions that have been somatically mutated ispossible. The reverse primers can be selected to allow amplification ofall subclasses of an Ig isotype (e.g., gamma, mu, alpha, kappa, lambda,etc.), or the individual sub-classes of heavy chains (e.g., gamma3 orgamma 4) to have more focused libraries.

Expression of V-region nucleic acids amplified using leader sequenceprimers leads to the inclusion of leader peptides in the V-regionprotein product. When mammalian cells such as CHO cells are used toexpress these nucleic acids, endogenous post-translational processingmechanisms remove the leader peptide, allowing expression of theV-region or multiple V-region (e.g., heavy and light chain) constructswithout the leader peptides interfering with Ag-binding screeningassays. In bacterial phage display techniques, however, leader peptidesare not removed. The presence of leader peptides can interfere inAg-binding screening assays and thereby prevent detection of V-regionswith high affinity for a target Ag.

To overcome this problem, the invention also provides for the creationof a second V-region library that is made using a larger thanconventional set of 5′ V-region primers (“Second Step” amplification).This larger set of primers allows a greater number of V-region sequencesto be amplified compared to the number amplified using a traditionalprimer set. Of course, because these primers are designed to amplifygermline sequences and not somatically mutated sequences, the primersthemselves will introduce sequence errors in the amplification products(i.e., so that the somatic mutations are removed and substituted withgermline sequences). After phage display is used to isolate thoseV-regions in scFv-expressing phage that bind target Ag, the nucleicacids encoding the identified V-regions are then sequenced. Using thatsequence information, reverse primers that specifically amplify thosenucleic acids are made and used along with leader sequence forwardprimers to amplify the V-regions from the first library to identify theactual and complete (somatic hypermutated) sequences of the Ag-specificV-regions that occur in the donor from which the first library was made.

In addition to the foregoing, a proteomics approach is also used whereinplasma IgG is isolated (e.g., using protein G chromatography followed byAg-affinity chromatography). The resulting Ag-specific IgG mixture isfurther separated by capillary isoelectric focusing. The IgGs are thendigested with proteases and the peptide fragment amino acid sequencesare determined by mass spectrometry (MS). In the event that Second Stepamplification fails to generate the appropriate somatic hypermutatedV-region sequences (i.e., V-region germline primers failed tocross-hybridize/anneal with the desired 5′ V-region sequence, whichcontained somatic mutated sequence from the first round amplification),de novo sequencing information of the isolated Ag-specific IgG can beused to perform nucleic acid sequencing of V-region products of thefirst round cDNA. Specifically, partial sequencing data from MS de novoprotein sequencing is used to design V-region reverse primers. Thesereverse primers can be used together with leader sequence forwardprimers to identify and sequence the 5′ ends of the V-region. Once thesehypermutated regions have been identified, complete V-region sequencescan be amplified using conventional PCR. These V-regions can then beassembled into complete, somatically hypermutated antibodies. MS peptidesequence information can also be used to corroborate that Ag-specificV-regions were correctly identified in the library screening process.

Accordingly, the invention features leader-specific primers foramplifying a nucleic acid encoding the V-region of a humanimmunoglobulin. The primers can include (a) a nucleic acid sequence thatbinds under stringent hybridization conditions to the complement of anucleotide sequence selected from SEQ ID NOs:1-76, or (b) a nucleic acidsequence selected from SEQ ID NOs:1-76.

In another aspect, the invention features a set of leader-specificprimers for amplifying at least 90% (e.g., at least 90, 91, 92, 93, 94,95, 96, 97, 98, or 99%) of the nucleic acids encoding the V-region of ahuman immunoglobulin in a human tissue (e.g., peripheral blood) sample.The set can be one that amplifies 100% of the nucleic acids encoding theV-region of a human immunoglobulin in a human tissue sample. The set caninclude at least 50 different primers, and each of the different primerscan include a different nucleic acid sequence, e.g., those of SEQ IDNOs:1-76. The set can also include at least 32 different primers, andeach of the different primers can include a different nucleic acidsequence, e.g., those of SEQ ID NOs:1-32. The set can include at least15 different primers, and each of the different primers can include adifferent nucleic acid sequence, e.g., those of SEQ ID NOs:33-48. Theset can include at least 27 different primers, and each of the differentprimers can include a different nucleic acid sequence, e.g., those ofSEQ ID NOs:49-76.

Also within the invention is a phage library that includes at least 10⁵(e.g., at least 10⁵, 10⁶, or 10⁷) nucleic acid molecules encodingimmunoglobulin heavy chains having somatic hypermutations in the first 8amino acids of the variable region.

The invention further includes a purified antibody that includes avariable region sequence determined by screening a phage display libraryincluding variable region-encoding nucleic acid sequences isolated byPCR amplification of B lymphocytes isolated from a human subject usingleader-specific primers.

In another aspect, the invention features a method that includes thesteps of: (a) using a first PCR to amplify human immunoglobulin heavyand light chain nucleotide sequences from a human tissue sampleincluding B lymphocytes, wherein leader-specific specific primers areused in the PCR amplification; (b) isolating the amplification productsfrom the first PCR step; (c) creating a first set of constructs byinserting the isolated amplification products from the first PCR stepinto expression vectors; and (d) introducing the first set of constructsinto a first set of host cells to create a first library of humanimmunoglobulin nucleotide sequences. In this method, the leader-specificprimers can include a set of different polynucleotides each of whichbinds under stringent hybridization conditions to the complement of adifferent nucleic acid sequence selected from the group consisting ofSEQ ID NOs:1-76. The method can also include one or more of the stepsof: (e) using a second PCR to amplify human immunoglobulin nucleotidesequences from the first library or the human tissue sample, whereinvariable region-specific specific primers are used in the PCRamplification; (f) isolating a second set of amplification products fromthe second PCR step; (g) creating a second set of constructs byinserting the second set of isolated amplification products intoexpression vectors; (h) introducing the second set of constructs into asecond set of host cells to create a second library of humanimmunoglobulin nucleotide sequences; (i) screening the second libraryfor a nucleotide sequence that encodes a variable region thatspecifically binds a target antigen; (j) obtaining the sequence of thenucleotide sequence that encodes a variable region that specificallybinds a target antigen; (k) using the obtained sequence information togenerate a reverse primer specific for the nucleotide sequence thatencodes a variable region that specifically binds a target antigen; and(l) using the reverse primer and the leader specific primers in a thirdPCR to amplify a nucleotide sequence from the first library thatcorresponds to the sequence of an immunoglobulin in a B lymphocyte inthe human tissue sample that promotes binding of the immunoglobulin tothe target antigen. Proteomics can also be used in the methods of theinvention as described herein.

Additionally within the invention is a human immunoglobulin cDNA librarythat includes a complete set of somatic hypermutated V-region sequences.

Unless otherwise defined, all technical terms used herein have the samemeaning as commonly understood by one of ordinary skill in the art towhich this invention belongs. Commonly understood definitions ofbiological terms can be found in Rieger et al., Glossary of Genetics:Classical and Molecular, 5th edition, Springer-Verlag: New York, 1991;and Lewin, Genes V, Oxford University Press: New York, 1994. Commonlyunderstood definitions of medical terms can be found in Stedman'sMedical Dictionary, 27^(th) Edition, Lippincott, Williams & Wilkins,2000.

As used herein, an “antibody” or “Ab” is an immunoglobulin (Ig), asolution of identical or heterogeneous Igs, or a mixture of Igs. An“antibody” can also refer to fragments and engineered versions of Igssuch as Fab, Fab′, and F(ab′)₂ fragments; and scFv's, heteroconjugateAbs, and similar artificial molecules that employ Ig-derived CDRs toimpart antigen specificity. A “monoclonal antibody” or “mAb” is an Abexpressed by one clonal B cell line or a population of Ab molecules thatcontains only one species of an antigen binding site capable ofimmunoreacting with a particular epitope of a particular antigen. A“polyclonal antibody” or “polyclonal Ab” is a mixture of heterogeneousAbs. Typically, a polyclonal Ab will include myriad different Abmolecules which bind a particular antigen with at least some of thedifferent Abs immunoreacting with a different epitope of the antigen. Asused herein, a polyclonal Ab can be a mixture of two or more mAbs.

An “antigen-binding portion” of an Ab is contained within the variableregion of the Fab portion of an Ab and is the portion of the Ab thatconfers antigen specificity to the Ab (i.e., typically thethree-dimensional pocket formed by the CDRs of the heavy and lightchains of the Ab). A “Fab portion” or “Fab region” is the proteolyticfragment of a papain-digested Ig that contains the antigen-bindingportion of that Ig. A “non-Fab portion” is that portion of an Ab notwithin the Fab portion, e.g., an “Fc portion” or “Fc region.” A“constant region” of an Ab is that portion of the Ab outside of thevariable region.

When referring to a protein molecule such as an Ab, “purified” meansseparated from components that naturally accompany such molecules.Typically, an Ab or protein is purified when it is at least about 10%(e.g., 9%, 10%, 20%, 30% 40%, 50%, 60%, 70%, 80%, 90%, 95%, 98%, 99%,99.9%, and 100%), by weight, free from the non-Ab proteins or othernaturally-occurring organic molecules with which it is naturallyassociated. Purity can be measured by any appropriate method, e.g.,column chromatography, polyacrylamide gel electrophoresis, or HPLCanalysis. A chemically-synthesized protein or other recombinant proteinproduced in a cell type other than the cell type in which it naturallyoccurs is “purified.”

By “bind”, “binds”, or “reacts with” is meant that one moleculerecognizes and adheres to a particular second molecule in a sample, butdoes not substantially recognize or adhere to other molecules in thesample. Generally, an Ab that “specifically binds” another molecule hasa Kd greater than about 10⁵, 10⁶, 10⁷, 10⁸, 10⁹, 10¹⁰, 10¹¹, or 10¹²liters/mole for that other molecule.

As used herein with respect to nucleic and amino acids, the term“percent sequence identity” means the number of exact matches betweentwo aligned sequences divided by the length of the shorter sequences andmultiplied by 100. Alignment of nucleic acid sequences can be performedas described in Smith and Waterman, Advances in Applied Mathematics2:482-489 (1981), and for amino acid sequences as described in Dayhoff,Atlas of Protein Sequences and Structure, M. O. Dayhoff ed., 5 suppl.3:353-358, National Biomedical Research Foundation, Washington, D.C.,USA, and Gribskov (1986) Nucl. Acids Res. 14:6745. Two DNA sequences, ortwo polypeptide sequences are “substantially homologous” to each otherwhen the sequences exhibit at least about 80%-85%, at least about85%-90%, at least about 90%-95%, or at least about 95%-98% sequenceidentity over a defined length of the molecules, as determined using themethods above. As used herein, substantially homologous also refers tosequences showing complete identity to the specified DNA or polypeptidesequence. DNA sequences that are substantially homologous can beidentified in a Southern hybridization experiment under, for example,stringent conditions, as defined for that particular system. Definingappropriate hybridization conditions is within the skill of the art.See, e.g., Sambrook et al., Molecular Cloning: A Laboratory Manual,Second Edition, (1989) Cold Spring Harbor, N.Y.; Nucleic AcidHybridization: A Practical Approach, editors B. D. Hames and S. J.Higgins, (1985) Oxford; Washington, D.C.; IRL Press.

As used herein, the term “stringent hybridization conditions” means in6× sodium chloride/sodium citrate (SSC) at 45° C., followed by washingin 0.2×SSC, 0.1% SDS at least 60° C.

Although methods and materials similar or equivalent to those describedherein can be used in the practice or testing of the present invention,suitable methods and materials are described below. All publications,patent applications, patents, and other references mentioned herein areincorporated by reference in their entirety. In the case of conflict,the present specification, including definitions will control. Inaddition, the particular embodiments discussed below are illustrativeonly and not intended to be limiting.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic illustration of a step in a heavy chain PCRamplification strategy of the invention.

FIG. 2 is a schematic illustration of another step in the heavy chainPCR amplification strategy of the invention.

FIG. 3 is a schematic illustration of a step in a light chain PCRamplification strategy of the invention.

FIG. 4 is a schematic illustration of another step in the light chainPCR amplification strategy of the invention.

FIG. 5 is a graph showing that the presence of leader peptide upstreamof either heavy chain variable or light chain variable region inhibitedthe ScFv fragment from binding to its antigen. The ScFv fragmentcontaining leader on heavy chain (LC-LHC), leader on light chain(LLC-HC) and leaders on heavy and light chain (LLC-LHC) reduce bindingto the antigen when compared to ScFv fragments with no leader (LC-HC).

DETAILED DESCRIPTION

The invention encompasses methods, compositions, and kits relating tothe identification of human antibody sequences containing somatichypermutations. The below described preferred embodiments illustrateadaptation of these methods, compositions, and kits. Nonetheless, fromthe description of these embodiments, other aspects of the invention canbe made and/or practiced based on the description provided below.

General Methodology

Methods involving conventional immunological and molecular biologicaltechniques are described herein. Immunological methods (for example,assays for detection and localization of antigen-Ab complexes,immunoprecipitation, immunoblotting, and the like) are generally knownin the art and described in methodology treatises such as CurrentProtocols in Immunology, Coligan et al., ed., John Wiley & Sons, NewYork. Techniques of molecular biology are described in detail intreatises such as Molecular Cloning: A Laboratory Manual, 2nd ed., vol.1-3, Sambrook et al., ed., Cold Spring Harbor Laboratory Press, ColdSpring Harbor, N.Y., 2001; and Current Protocols in Molecular Biology,Ausubel et al., ed., Greene Publishing and Wiley-Interscience, New York.Ab methods are described in Handbook of Therapeutic Abs, Dubel, S., ed.,Wiley-VCH, 2007.

Overview

In traditional phage and yeast display methods for identifyingAg-specific Ig V-regions, PCR is used to amplify nucleic acid sequencescorresponding to the V-regions of heavy and/or light Ig chains. Theforward primers that initiate the PCR amplification process have anucleic acid sequence that is the complement of the 5′ germline end ofeach V-region so that cDNAs encoding the V-regions will be amplified. Inthe case where the 5′ V gene segment differs from germline due tosomatic hypermutation, this PCR process preferentially amplifiesproducts that have the 5′ germline sequence which leads to loss ofdiversity in phage and yeast display libraries.

In one aspect of the invention, V-regions are amplified using leadersequence-specific forward primers. Because the leader sequence is justupstream of the V-region genes, all V-region sequences, including thosewith extensive 5′ end mutations, are amplified without loss of theoriginal 5′ V gene segment sequence. Libraries made with this methodtherefore contain a more complete V-region repertoire than thoseprepared by the conventional method. These libraries can be screened forAg-specific V-regions using mammalian cells engineered to express theamplified V-region-encoding nucleic acids. Endogenous post-translationalprocessing mechanisms remove the leader peptide, allowing expression ofthe V-region or multiple V-region (e.g., heavy and light chain)constructs without the leader peptides interfering with Ag-bindingscreening assays. In bacterial phage display techniques, where leaderpeptides can interfere in Ag-binding screening assays, the inventionalso provides for the creation of a second V-region library that is madeusing a larger than conventional set of 5′ V-region primers. This largerset of primers allows a greater number of V-region sequences to beamplified compared to the number amplified using a traditional primerset, but as described above causes sequence errors to be introduced intothe amplification products. Phage display is used to screen for thoseV-regions in scFv that bind the target Ag with high affinity. Thenucleic acids encoding the identified V-regions are then sequenced, and,using that sequence information, reverse primers that specificallyamplify those nucleic acids are made and used along with leader sequenceforward primers to amplify the V-regions from the first library toidentify the actual and complete sequences of the Ag-specific V-regionsthat occur in the donor from which the first library was made. Aminoacid sequence information from fragments of donor Igs can be used toassist in the identification of nucleic acids encoding the heavy andlight chains of donor Abs as well as to design PCR primers to amplifysuch nucleic acids.

PCR Primers

One aspect of the invention includes forward PCR primers for amplifyingheavy and light chain human Ig V-regions from the nucleotides encodingtheir leader sequences. A set (e.g., 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12,13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,or more) of such primers that can be used to amplify 100% of thesesequences is preferred to capture low frequency V-regions, although aset of primers that captures less than 100% (e.g., 70-99, 70-75, 75-80,80-85, 85-90, 90-95, or 95-99%) of these sequenced might also be used.An example of a set of these primers in set forth in Tables 1A, 1B, and1C below. Using these primers sets (or a primer set with primers thatare substantially homologous to those in Tables 1A, 1B, and 1C) andappropriate reverse primers, because the leader sequence is justupstream of the V-region genes, all V-region sequences, including thosewith extensive 5′ end mutations, can be amplified.

TABLE 1A Heavy chain Leader forward primers NA-LH1-a-FATG GAC TGG ACC TGG AGG ATC CTC TTC [SEQ ID NO: 1] NA-LH1-b-FATG GAC TGG ACC TGG AGG ATC CTC TTT TTG [SEQ ID NO: 2] NA-LH1-c-FATG GAC TGG ACC TGG AGC ATC CTT TTC [SEQ ID NO: 3] NA-LH1-d-FATG GAC TGC ACC TGG AGG ATC CTC [SEQ ID NO: 4] NA-LH1-e-FATG GAC TGG ACC TGG AGA ATC CTC TTC TTG [SEQ ID NO: 5] NA-LH1-f-FATG GAC TGG ACC TGG AGG GTC TTC [SEQ ID NO: 6] NA-LH1-g-FATG GAC TGG ATT TGG AGG ATC CTC TTC TTG [SEQ ID NO: 7] NA-LH2-a-FATG GAC ACA CTT TGC TCC ACG CTC [SEQ ID NO: 8] NA-LH2-b-FATG GAC ACA CTT TGC TAC ACG CTC CTG [SEQ ID NO: 9] NA-LH3-a-FATG GAA TTG GGG CTG AGC TGG G [SEQ ID NO: 10] NA-LH3-b-FATG GAG TTG GGA CTG AGC TGG ATT TTC CTT [SEQ ID NO: 11] NA-LH3-c-FATG GAG TTT GGG GGG CTG AGC TGG GTT TTC  CTT [SEQ ID NO: 12] NA-LH3-d-FATG GAG TTG GGG CTG AGC TGG [SEQ ID NO: 13] NA-LH3-e-FATG GAG TTT GGG CTG AGC TGG ATT [SEQ ID NO: 14] NA-LH3-f-FATG GAG TTT GGG CTG AGC TGG GTT TT [SEQ ID NO: 15] NA-LH3-g-FATG GAA CTG GGG CTC CGC TG [SEQ ID NO: 16] NA-LH3-h-FATG GAG TTT GGG CTG AGC TGG C [SEQ ID NO: 17] NA-LH3-i-FATG GAG TTT GGG CTG AGC TGG GTT [SEQ ID NO: 18] NA-LH3-j-FATG GAG TTT GGA CTG AGC TGG GTT T [SEQ ID NO: 19] NA-LH3-k-FATG GAG TTG GGG CTG TGC TGG [SEQ ID NO: 20] NA-LH3-l-FATG GAG TTT GGG CTT AGC TGG GTT TTC [SEQ ID NO: 21] NA-LH3-m-FATG GAG TTT TGG CTG AGC TGG GTT TTC [SEQ ID NO: 22] NA-LH3-n-FATG ACG GAG TTT GGG CTG AGC TG [SEQ ID NO: 23] NA-LH3-o-FATG GAG TTT GGG CTG AGC TGG G [SEQ ID NO: 24] NA-LH4-a-FATG AAA CAC CTG TGG TTC TTC CTC CTG [SEQ ID NO: 25] NA-LH4-b-FATG AAA CAC CTG TGG TTC TTC CTC CTC CTG [SEQ ID NO: 26] NA-LH4-c-FATG AAG CAC CTG TGG TTC TTC CTC CTG [SEQ ID NO: 27] NA-LH4-d-FATG AAA CAT CTG TGG TTC TTC CTT CTC CTG [SEQ ID NO: 28] NA-LH4-e-FATG AAG CAC CTG TGG TTT TTC CTC CTG [SEQ ID NO: 29] NA-LH5-a-FATG GGG TCA ACC GCC ATC CTC [SEQ ID NO: 30] NA-LH5-b-FATG GGG TCA ACC GCC ATC CTT G [SEQ ID NO: 31] NA-LH6-FATG TCT GTC TCC TTC CTC ATC TTC CTG C [SEQ ID NO: 32]

TABLE 1B Kappa chain leader forward primers NA-Lk1-a-FATG GAC ATG AGG GTC CCC GC [SEQ ID NO: 33] NA-LK1-b-FATG GAC ATG AGG GTC CCT GCT CAG [SEQ ID NO: 34] NA-LK1-c-FATG GAC ATG AGA GTC CTC GCT CAG C [SEQ ID NO: 35] NA-LK1-d-FATG GAC ATG AGG GTC CTC GCT CAG [SEQ ID NO: 36] NA-Lk1-e-FATG GAC ATG AGG GTG CCC GC [SEQ ID NO: 37] NA-LK1-f-FATA GAC ATG AGG GTC CCC GCT CAG [SEQ ID NO: 38] NA-LK2-a-FATG AGG CTC CCT GCT CAG CTC C [SEQ ID NO: 39] NA-LK2-b-FATG AGG CTC CTT GCT CAG CTT CTG G [SEQ ID NO: 40] NA-LK3-a-FATG GAA ACC CCA GCG CAG CTT C [SEQ ID NO: 41] NA-LK3-b-FATG GAA GCC CCA GCG CAG CT [SEQ ID NO: 42] NA-LK3-c-FATG GAA GCC CCA GCT CAG CTT CT [SEQ ID NO: 43] NA-LK3-d-FATG GAA CCA TGG AAG CCC CAG C [SEQ ID NO: 44] NA-LK4-FATG GTG TTG CAG ACC CAG GTC TTC ATT TC [SEQ ID NO: 45] NA-LK5-FATG GGG TCC CAG GTT CAC CTC C [SEQ ID NO: 46] NA-LK6-a-FATG TTG CCA TCA CAA CTC ATT GGG TTT CTG [SEQ ID NO: 47] NA-LK6-b-FATG GTG CCC TCG CTG CTC TTT C [SEQ ID NO: 48]

TABLE 1C Lambda chain leader forward primers NA-LL1-a-FATG GCC TGG TCC CCT CTC TTC CTC [SEQ ID NO: 49] NA-LL1-b-FATG GCC TGG TCT CCT CTC CTC CTC AC [SEQ ID NO: 50] NA-LL1-c-FATG GCC AGC TTC CCT CTC CTC CTC [SEQ ID NO: 51] NA-LL1-d-FATG GCC GGC TTC CCT CTC CTC C [SEQ ID NO: 52] NA-LL1-e-FATG ACC TGC TCC CCT CTC CTC CTC AC [SEQ ID NO: 53] NA-LL2-a-FATG GCC TGG GCT CTG CTC CTC [SEQ ID NO: 54] NA-LL2-b-FATG GCC TGG GCT CTG CTG CTC [SEQ ID NO: 55] NA-LL2-c-FATG GCC TGG GCT CTG CTA TTC CTC AC [SEQ ID NO: 56] NA-LL3-a-FATG GCA TGG ATC CCT CTC TTC CTC GGC [SEQ ID NO: 57] NA-LL3-b-FATG GCC TGG ACC GCT CTC CTT CTG [SEQ ID NO: 58] NA-LL3-c-FATG GCC TGG ACC CCT CTC CTG C [SEQ ID NO: 59] NA-LL3-d-FATG GCC TGG ATC CCT CTC CTG CTC [SEQ ID NO: 60] NA-LL3-e-FATG GCC TGG ACC CCT CTC TGG C [SEQ ID NO: 61] NA-LL3-f-FATG GCC TGG ACC GTT CTC CTC CTC [SEQ ID NO: 62] NA-LL3-g-FATG GCA TGG GCC ACA CTC CTG C [SEQ ID NO: 63] NA-LL3-h-FATG GCC TGG ATC CCT CTC CTT CTC CC [SEQ ID NO: 64] NA-LL4-a-FATG GCC TGG GTC TCC TTC TAC CTA CTG C [SEQ ID NO: 65] NA-LL4-b-FATG GCC TGG ACC CAA CTC CTC CTC [SEQ ID NO: 66] NA-LL4-c-FATG GCC TGG ACC CCA CTC CTC C [SEQ ID NO: 67] NA-LL4-d-FATG GCT TGG ACC CCA CTC CTC TTC CTC [SEQ ID NO: 68] NA-LL5-a-FATG GCC TGG ACT CCT CTT CTT CTC TTG CTC [SEQ ID NO: 69] NA-LL5-b-FATG GCC TGG ACT CCT CTC CTC CTC C [SEQ ID NO: 70] NA-LL5-c-FATG GCC TGG ACT CTT CTC CTT CTC GTG C [SEQ ID NO: 71] NA-LL6-FATG GCC TGG GCT CCA CTA CTT CTC ACC [SEQ ID NO: 72] NA-L17-FATG GCC TGG ACT CCT CTC TTT CTG TTC CTC [SEQ ID NO: 73] NA-LL8-FATG GCC TGG ATG ATG CTT CTC CTC GGA C [SEQ ID NO: 74] NA-LL9-FATG GCC TGG GCT CCT CTG CTC [SEQ ID NO: 75] NA-LL10-FATG CCC TGG GCT CTG CTC CTC [SEQ ID NO: 76]

In another aspect of the invention, a set of V-region PCR primers can beused to amplify nucleic acids encoding the heavy and light chain Igvariable regions. Preferred sets of such primers are those that allow agreater number of V-region sequences to be amplified compared to thenumber amplified using a traditional primer set. A set of such primersthat can be used to amplify 100% of these sequences is preferred tocapture low frequency V-regions, although a set of primers that capturesless than 100% (e.g., 70-99, 70-75, 75-80, 80-85, 85-90, 90-95, or95-99%) of these sequenced might all be used. Examples of sets of suchprimers are set forth in Tables 2 and 3 below. Using these primers sets(or a primer set with primers that are substantially homologous to thosein Tables 2 and 3) and appropriate reverse primers, most if not allV-region sequences can be amplified by PCR, although this processintroduces errors in those chains with hypermutated 5′ V-regions (whichcan be corrected as described herein). For creation of scFv, each of theforward primers described herein can be engineered to includeappropriate linker sequences such as the longlink sequenceGGTGGTTCCTCTAGATCTTCCTCCTCTGGTGGCGGTGGCTCGGGCGGTGGTGG G [SEQ ID NO: 176](which can be located 5′ to the primer sequence for the heavy chain andused with a suitable reverse primer such as SEQ NO:99 shown at thebottom of Table 2) or the linkers shown upstream of the primer sequencesin SEQ ID NOs: 142-175 (used with the corresponding reverse primers suchas SEQ ID NOs: 114, 133, and 134).

TABLE 2 Primer name Sequence VH1a-F CAG GTK CAG CTG GTG CAG TCT GG[SEQ ID NO: 77] VH1b-F CAG GTC CAG CTT GTG CAG TCT GG [SEQ ID NO: 78]VH1c-F SAG GTC CAG CTG GTA CAG TCT GG [SEQ ID NO: 79] VH1d-FCAR ATG CAG CTG GTG CAG TCT GG [SEQ ID NO: 80] VH2a-FCAG ATC ACC TTG AAG GAG TCT GG [SEQ ID NO: 81] VH2b-FCAG GTC ACC TTG ARG GAG TCT GG [SEQ ID NO: 82] VH2c-FCGG GTC ACC TTG AGG GAG TCT GG [SEQ ID NO: 83] VH3a-FGAR GTG CAG CTG GTG GAG TCT GG [SEQ ID NO: 84] VH3b-FCAG GTG CAG CTG GTG GAG TCT GG [SEQ ID NO: 85] VH3c-FGAG GTG CAG CTG TTG GAG TCT GG [SEQ ID NO: 86] VH3d-FGAG GTG CAG CTG GTG GAG [SEQ ID NO: 87] VH3e-FCAG GTG CAG CTG TTG GAG TCT GG [SEQ ID NO: 88] VH3f-FGAG GTG CAG CTG GTG GAG TCT GC [SEQ ID NO: 89] VH3g-FGAG GTG CAG CTG GTG GAG ACT GG [SEQ ID NO: 90] VH3h-FGAG GTG CAG CTG GTG GAG TCT CG [SEQ ID NO: 91] VH4a-FCAG STG CAG CTG CAG GAG TCS GG [SEQ ID NO: 92] VH4b-FCAG GTG CAG CTA CAG CAG TGG GG [SEQ ID NO: 93] VH4c-FCAG GTG CAG CTG CAG GAC TCG GG [SEQ ID NO: 94] VH4d-FCAG GTG CGG CTG CAG GAG TCG GG [SEQ ID NO: 95] VH5-FGAR GTG CAG CTG GTG CAG TCT GG [SEQ ID NO: 96] VH6-FCAG GTA CAG CTG CAG CAG TCA GG [SEQ ID NO: 97] VH7-FCAG GTS CAG CTG GTG CAA TCT GG [SEQ ID NO: 98] Sfi-CH1@5′-RCCTGGCCGGCCTGGCCACTAGT GAC CGA TGG GCC CTT GGT GGA [SEQ ID NO: 99]

TABLE 3A Kappa chain Primer name Sequence Sfi-VK1a-FGGGCCCAGGCGGCCRAC ATC CAG ATG ACC CAG TCT CC [SEQ ID NO: 142]ATC CAG ATG ACC CAG TCT CC [SEQ ID NO: 100] Sfi-VK1b-FGGGCCCAGGCGGCCGMC ATC CAG TTG ACC CAG TCT CC [SEQ ID NO: 143]ATC CAG TTG ACC CAG TCT CC [SEQ ID NO: 101] Sfi-VK1c-FGGGCCCAGGCGGCCGCC ATC CRG ATG ACC CAG TCT CC [SEQ ID NO: 144]ATC CRG ATG ACC CAG TCT CC [SEQ ID NO: 102] Sfi-VK1d-FGGGCCCAGGCGGCCGTC ATC TGG ATG ACC CAG TCT CC [SEQ ID NO: 145]ATC TGG ATG ACC CAG TCT CC [SEQ ID NO: 103] Sfi-VK2a-FGGGCCCAGGCGGCCGAT ATT GTG ATG ACC CAG ACT CCA C [SEQ ID NO: 146]ATT GTG ATG ACC CAG ACT CCA C [SEQ ID NO: 104] Sfi-VK2b-FGGGCCCAGGCGGCCGAT RTT GTG ATG ACT CAG TCT CCA CTC [SEQ ID NO 147]RTT GTG ATG ACT CAG TCT CCA CTC [SEQ ID NO: 105] Sfi-VK2c-FGGGCCCAGGCGGCCGAG ATT GTG ATG ACC CAG ACT CCA C [SEQ ID NO: 148]ATT GTG ATG ACC CAG ACT CCA C [SEQ ID NO: 106] Sfi-VK3a-FGGGCCCAGGCGGCCGAA ATT GTG TTG ACR CAG TCT CCA G [SEQ ID NO: 149]ATT GTG TTG ACR CAG TCT CCA G [SEQ ID NO: 107] Sfi-VK3b-FGGGCCCAGGCGGCCGAA ATA GTG ATG ACG CAG TCT CCA G [SEQ ID NO: 150]ATA GTG ATG ACG CAG TCT CCA G [SEQ ID NO: 108] Sfi-VK3c-FGGGCCCAGGCGGCCGAA ATT GTA ATG ACA CAG TCT CCA SCC [SEQ ID NO: 151]ATT GTA ATG ACA CAG TCT CCA SCC [SEQ ID NO: 109] Sfi-VK4-FGGGCCCAGGCGGCCGAC ATC GTG ATG ACC CAG TCT CC [SEQ ID NO: 152]ATC GTG ATG ACC CAG TCT CC [SEQ ID NO: 110] Sfi-VK5-FGGGCCCAGGCGGCCGAA ACG ACA CTC ACG CAG TCT CC [SEQ ID NO: 153]ACG ACA CTC ACG CAG TCT CC [SEQ ID NO: 111] Sfi-VK6a-FGGGCCCAGGCGGCCGAA ATT GTG CTG ACT CAG TCT CCA GAC [SEQ ID NO: 154]ATT GTG CTG ACT CAG TCT CCA GAC [SEQ ID NO: 112] Sfi-VK6b-FGGGCCCAGGCGGCCGAT GTT GTG ATG ACA CAG TCT CCA GCT [SEQ ID NO: 155]GTT GTG ATG ACA CAG TCT CCA GCT [SEQ ID NO: 113] Linker-CK@5′-RGGAAGATCTAGAGGAACCACCGAA GAC AGA TGG TGC AGC CAC AGT [SEQ ID NO: 114][Note: R represents A or G at that position; Y represents C or T at thatposition; M represents A or C at that position; and S represents G or Cat that position] 

TABLE 3B Lambda chain Primer name Sequence Sfi-VL1a-FGGGCCCAGGCGGCCCAG TCT GTG CTG ACT CAG CCR CC [SEQ ID NO: 156]TCT GTG CTG ACT CAG CCR CC [SED ID NO: 115] Sfi-VL1b-FGGGCCCAGGCGGCCCAG TCT GTG YTG ACG CAG CCR CC [SEQ ID NO: 157]TCT GTG YTG ACG CAG CCR CC [SEQ ID NO: 116] Sfi-VL1c-FGGGCCCAGGCGGCCCAG TCT GTC GTG ACG CAG CCR CC [SEQ ID NO: 158]TCT GTC GTG ACG CAG CCR CC [SEQ ID NO: 117] Sfi-VL2-FGGGCCCAGGCGGCCCAG TCT GCC CTG ACT CAG CCT SSC TC [SEQ ID NO: 159]TCT GCC CTG ACT CAG CCT SSC TC [SEQ ID NO: 118] Sfi-VL3a-FGGGCCCAGGCGGCCTCC TAT GWG CTG ACT CAG CCA CYC [SEQ ID NO: 160]TAT GWG CTG ACT CAG CCA CYC [SEQ ID NO: 119] Sfi-VL3b-FGGGCCCAGGCGGCCTCC TAT GAG CTG ACA CAG CYA CC [SEQ ID NO: 161]TAT GAG CTG ACA CAG CYA CC [SEQ ID NO: 120] Sfi-VL3c-FGGGCCCAGGCGGCCTCT TCT GAG CTG ACT CAG GAC CC [SEQ ID NO: 162]TCT GAG CTG ACT CAG GAC CC [SEQ ID NO: 121] Sfi-VL3d-FGGGCCCAGGCGGCCTCC TAT GAG CTG ATG CAG CCA CC [SEQ ID NO: 163]TAT GAG CTG ATG CAG CCA CC [SEQ ID NO: 122] Sfi-VL3e-FGGGCCCAGGCGGCCTCC TAT GAG CTG ACA CAG CCA TCC TC [SEQ ID NO: 164]TAT GAG CTG ACA CAG CCA TCC TC [SEQ ID NO: 123] Sfi-VL3f-FGGGCCCAGGCGGCCTCC TAT GAG CTG ACT CAG CCA CAC TC [SEQ ID NO: 165]TAT GAG CTG ACT CAG CCA CAC TC [SEQ ID NO: 124] Sfi-VL3g-FGGGCCCAGGCGGCCTCC TCT GGG CCA ACT CAG GTG CC [SEQ ID NO: 166]TCT GGG CCA ACT CAG GTG CC [SEQ ID NO: 125] Sfi-VL4a-FGGGCCCAGGCGGCCCAG CYT GTG CTG ACT CAA TCR YCC [SEQ ID NO: 167]CYT GTG CTG ACT CAA TCR YCC [SEQ ID NO: 126] Sfi-VL4b-FGGGCCCAGGCGGCCCTG CCT GTG CTG ACT CAG CCC CC [SEQ ID NO: 168]CCT GTG CTG ACT CAG CCC CC [SEQ ID NO: 127] Sfi-VL5a-FGGGCCCAGGCGGCCCAG SCT GTG CTG ACT CAG CCR BCT TCC [SEQ ID NO: 169]SCT GTG CTG ACT CAG CCR BCT TCC [SEQ ID NO: 131] Sfi-VL5b-FGGGCCCAGGCGGCCCAG CCT GTG CTG ACT CAG CCA ACC TCC [SEQ ID NO: 170]CCT GTG CTG ACT CAG CCA ACC TCC [SEQ ID NO: 132] Sfi-VL6-FGGGCCCAGGCGGCCAAT TTT ATG CTG ACT CAG CCC CAC [SEQ ID NO: 171]TTT ATG CTG ACT CAG CCC CAC [SEQ ID NO: 128] Sfi-VL7-FGGGCCCAGGCGGCCCAG RCT GTG GTG ACT CAG GAG CC [SEQ ID NO: 172]RCT GTG GTG ACT CAG GAG CC [SEQ ID NO: 129] Sfi-VL8-FGGGCCCAGGCGGCCCAG ACT GTG GTG ACC CAG GAG CC [SEQ ID NO: 173]ACT GTG GTG ACC CAG GAG CC [SEQ ID NO: 130] Sfi-VL9-FGGGCCCAGGCGGCCCAG CCT GTG CTG ACT CAG CCA CC [SEQ ID NO: 174]CCT GTG CTG ACT CAG CCA CC [SEQ ID NO: 177] Sfi-VL10-FGGGCCCAGGCGGCCCAG GCA GGG CTG ACT CAG CCA CC [SEQ ID NO: 175]GCA GGG CTG ACT CAG CCA CC [SEQ ID NO: 178] Linker-CL@5′1-RGGAAGATCTAGAGGAACCACCCGT GGG GTT GGC CTT GGG CTG ACC [SEQ ID NO: 133]Linker-CL@′2-7-R GGAAGATCTAGAGGAACCACCCGA GGG GGC AGC CTT GGG CTG ACC[SEQ ID NO: 134]

The invention might also include a set of reverse PCR primers foramplifying the heavy and light chain human Ig V-regions from a sequencelocated 3′ of the nucleotides encoding the V-region (e.g., nucleic acidsequences encoding a hinge or constant region). The reverse primers canbe selected to allow amplification of all subclasses of an Ig isotype(e.g., gamma, mu, alpha, epsilon, kappa, lambda, etc.), or theindividual sub-classes of heavy chains (e.g., gamma 1, gamma 2, gamma 3,gamma 4, alpha 1 and alpha 2) to have more focused libraries. Theparticular sequences of such reverse primers can be determined by one ofskill in the art based on published sequences of the Ig isotype.Representative examples are shown in Table 4 below:

TABLE 4 All IgH-CH2- CCA CCA CCA CGC AYG TGA CC Heavy R [SEQ ID NO: 135]Chains Lambda CL-R CTT CTC CAC GGT GCT CCC TTC ATG  LightCG [SEQ ID NO: 136] Chain Kappa CK-R ACC CGA TTG GAG GGC GTT ATC CAC Light CT [SEQ ID NO: 137] Chain Isotype IgG1-TGT GTG AGT TTT GTC ACA AGA TTT Specific Hinge-RGGG CTC [SEQ ID NO: 138] IgG2- CGG TGG GCA CTC GAC ACA ACA TTT Hinge-RGCG CTC [SEQ ID NO: 139] IgG3- AGT TGT GTC ACC AAG TGG GGT TTT Hinge-RGAG CTC [SEQ ID NO: 140] IgG4- TGA TGG GCA TGG GGG ACC ATA TTT Hinge-RGGA CTC [SEQ ID NO: 141]

Although the foregoing specific primers are preferred, other primersthat hybridize to the complement of the specific sequences set forthherein under hybridization conditions used in PCR amplification orstringent hybridization conditions might also be used. For example,primers that have at least 75% (e.g., at least 80%, 85%, 90%, or 95%)sequence identity to, are substantially homologous to, and/or bind understringent hybridization conditions to the complement of the specificsequences set forth herein. The length of these might range from about(+/−5 nucleotides) 15-100, 17-50, 18-40, or 20-30 nucleotides.

V-Region Libraries

The heavy and light chain Ig amplification products made using theforegoing primers in PCR can be used to create V-region libraries. Forexample, nucleic acids encoding Ig chains can be inserted into vectors,and the resulting constructs can be introduced into host cells (e.g.,prokaryotic cells such as bacteria or eukaryotic cells such as yeast ormammalian cells) to make a library.

As one example, nucleic acids encoding naturally occurring Igs can beobtained from a donor sample containing B lymphocytes (e.g., peripheralblood mononuclear cells, isolated B cells or plasma cells, buffy coats,bone marrow, lymph node, spleen, or CNS samples). Typically, the samplesare obtained from human donors previously determined to possess Absspecific for target Ags of interest (e.g., by ELISA or RIA). Foridentifying Abs to self Ags, preferred donors are those who havecytokine-specific autoantibodies (e.g., to interferon gamma, tumornecrosis factor alpha, interleukin 4 and interleukin 10, GM-CSF,erythropoietin, IL-6, IL-17, IL-22, interferon-α, interferon-β, IL-2,IL-8, vascular endothelial growth factor, and/or granulocyte-colonystimulating factor), and those with autoimmune diseases such asrheumatoid arthritis or lupus.

In one method, total RNA is extracted from these cells andreverse-transcribed using polyA primers. The resulting cDNA is used astemplate for PCR amplification using the primers described above and asuitable PCR strategy such as that described in the Examples below. ThePCR products having the correct size can be visualized on agarose gels,excised, and purified. The purified products can then be digested with asuitable restriction enzyme (e.g., NotI or NcoI for the V heavy chainand SfiI or SalI for the V light chain). The digested fragments canagain be gel purified and ligated with ligase (e.g., T4 DNA ligase) intoa vector (e.g., a phagemid vector previously linearized with eitherNotI/NcoI or SfiI/SalI). The ligated products can be electroporated intocompetent host cells (e.g., E. coli) to prepare heavy and light chainlibraries. Isotype- and subclass-specific libraries can be made usingreverse primers specific to the isotype or subclass.

Display Libraries

The heavy and light chain libraries can be used to create otherlibraries of cells that display Fabs, scFvs, or other Ab-like constructsthat contain heavy and light chains. Such libraries can be created byisolation of the heavy and light chain-encoding nucleic acids by theabove-described methods or by repurifying these nucleic acids fromexisting V-region libraries. By inserting the isolated heavy and lightchain-encoding nucleic acids into suitable vectors and expressing theseconstructs in host cells, combinations of heavy and light chains can beassessed for Ag binding and/or subjected to multiple selection rounds toidentify those which bind a target Ag with high affinity. These can becharacterized by sequencing. See, e.g, Clackson et al., Nature352:624-628, 1991; McCafferty et al., Nature 348:552-554, 1990; and Kanget al., Proc. Nat'l Acad. Sci. 88:4363-4366, 1991.

For preparation of a scFv library, amplified VH and VL genes aregel-purified on agarose, and the scFv genes are assembled by overlap PCRusing VH and VL fragments as templates. First, VH and VL are assembledwith a linker by PCR without primers in which the short regions ofcomplementarity at the ends of the linker promote hybridization of thevarious fragments. PCR performed, first in the absence of primers, andthen after adding the outer primers. The scFv are digested with asuitable restriction enzyme, agarose gel-purified, and then ligated intoa phage-display vector that had been cut with the same restrictionenzyme. The ligated products are inserted (e.g., electroporated) intocompetent host cells (e.g., bacteria, yeast, or mammalian cells), andthe cells are plated and cultured on suitable clones are then scrapedoff the plates into a suitable freezing medium and frozen (e.g., insuperbroth medium with 10% glycerol at −70° C.). Fab libraries can beprepared in a similar fashion albeit without using a linker. Ag specificFab and scFv can be obtained by screening bacterial phage display, yeastdisplay, and/or mammalian cell display libraries techniques usingselection methods such as panning, cell sorting, using an automatedscreener/picker (e.g., CLONEPIX™ systems), and magnetic beadseparations.

Expression of V-region nucleic acids amplified using leader sequenceprimers leads to the inclusion of leader peptides in the V-regionprotein product. When mammalian cells such as CHO cells are used toexpress these nucleic acids, endogenous post-translational processingmechanisms remove the leader peptide, allowing expression of theV-region or multiple V-region (e.g., heavy and light chain) constructswithout the leader peptides interfering with Ag-binding screeningassays. In bacterial phage display techniques, however, leader peptidesare not removed. The presence of leader peptides can interfere inAg-binding screening assays and thereby prevent detection of V-regionswith high affinity for a target Ag.

To overcome this problem, the invention also provides for the creationof a second V-region library that is made using a larger thanconventional set of 5′ V-region primers. This larger set of primersallows a greater number of V-region sequences to be amplified comparedto the number amplified using a traditional primer set. Of course,because these primers are designed to amplify germline sequences and notsomatically mutated sequences, the primers themselves will introducesequence errors in the amplification products (i.e., so that the somaticmutations are removed and substituted with germline sequences). Phagedisplay is used to screen for those V-regions in scFv that bind thetarget Ag with high affinity. The nucleic acids encoding the identifiedV-regions are then sequenced, and, using that sequence information,reverse primers that specifically amplify those nucleic acids are madeand used along with leader sequence forward primers to amplify theV-regions from the first library (whose products were amplified withleader sequence primers) to identify the actual and complete sequencesof the Ag-specific V-regions that occur in the donor from which thefirst library was made.

Ag-Specific Human mAbs or Other Ag-Targeting Constructs

The heavy chain and/or light chain V-regions (or portions thereof suchas CDRs) identified as contributing to binding a particular target Agcan be used to make Ag-targeting constructs including full-length Abs,Ab fragments, engineered Abs, and fusion proteins. The full-length Absand other Ag-targeting constructs can include non-V-regions sequencesencoding Ig sequences specific to a particular isotype or subclass,e.g., IgG1, IgG2, IgG3, IgG4, IgD, IgA1, IgA2, IgE, or IgM. An Fabfragment may be a Fab′, F(ab)2, F(ab′)₂, F(ab)₃, Fv, scFv, ), dsFv, Fd,or dAb Engineered Abs can be minibodies, diabodies, triabodies,tetrabodies, or kappa bodies. Ag-binding constructs can be made usingrecombinant or protein engineering techniques, e.g., wherein theV-regions or CDRs are grafted into the remaining portions of theconstructs at the nucleic acid level.

Proteomics

In addition to the foregoing, a proteomics approach is also used whereinAg-specific Abs in a donor tissue sample (e.g., plasma IgG) are isolatedusing known techniques (e.g., protein G, protein A, protein L, or ionexchange chromatography followed by Ag-affinity chromatography). Theresulting Ag-specific Ab mixture is further separated, e.g., bycapillary isoelectric focusing. The Igs are then digested with proteasesand the peptide fragment amino acid sequences are determined by MS. Thepeptide sequence information can then be used to confirm thatAg-specific V-regions were correctly identified in the library screeningprocess, and PCR primers can also be designed to amplify Ag-specificV-regions from the earlier prepared libraries.

Kits

One or more of the sets of PCR primers described herein and/or librariesdescribed herein can be packaged in a kit which can include instructionsfor use.

EXAMPLES Example 1 PCR Strategy

Referring now to FIG. 1, three rounds of PCR for heavy chain and tworounds of PCR for the light chain are performed. A cDNA library madefrom human PBMCs is used as a template, and first round of PCR isperformed with the different leader forward primers (see Table 1A) and areverse primer located in the CH2 region of the heavy chain. The size ofthe resulting PCR product is 905-1046 nucleotides depending on thesubclass of IgGs. The second round of PCR uses the same forward primeras the first round, but reverse primers located in the hinge regions ofthe four different classes of IgG heavy chains using the reverse primerslisted in Table 4. This step results in a sub-class specific heavy chainamplification. Referring to FIG. 2, the third PCR round for heavy chainuses forward primers located at the 5′ end of the V-region (see Table 3)and a reverse primer located at the 5′ end of the CH1 domain. Theforward primers have adapters that introduce the long linker for theScFv fragment and the reverse primer has an adapter that introduces theSfi I site.

For the light chains, referring to FIG. 3, a PBMC cDNA library is alsoused as a template, and first round of PCR is performed using leaderforward primers and reverse primers located in the Cκ and Cλ region ofthe kappa and lambda chains. The resulting PCR product is run on anagarose gel, and for the kappa chain, ˜550 bp product is gel purifiedand used as a template in a second round PCR reaction. For the lambdachain, ˜630 product is gel purified and used as a template for secondround PCR reaction. The reverse primers are listed in Table 4. Referringto FIG. 4, the second round of PCR for light chain uses forward primerslocated at the 5′ end of the V-region and a reverse primer located atthe 5′ end of the Cκ or Cλ domains. The forward primers have adaptersthat introduce the Sfi I site and the reverse primer has an adapter thatintroduces the long linker sequence.

Overlap extension PCR is performed, wherein the 3^(rd) round heavy chainproducts are mixed with equal ratios of kappa and lambda chain 2^(nd)round PCR products to generate overlap products. The overlap productsare amplified using light chain adapter specific forward primers andheavy chain adapter specific reverse primer to generate ScFv fragments.The ScFv fragments are digested with Sfi I and cloned into the pComb3Xvector.

Example 2 PCR Using V-Region Primers

The following experiment was performed to test the effect of leadersequences on phage binding. A test ScFv fragment was generated with andwithout heavy and light chain leaders, and expressed on phages. Thephages were used in an ELISA against specific antigens. FIG. 5 showsthat the presence of leader peptide upstream of either heavy chainvariable or light chain variable region inhibited the ScFv fragment frombinding to its antigen.

Based on the above result, V-region specific primers are in the lastround of PCR in which the long linker and Sfi I site are introduced.Once high affinity antigen-specific candidate ScFv fragments aresequenced, a primer is designed in the Framework 2 region as a reverseprimer, and a leader specific forward primer is used on the 2^(nd) PCRproduct of heavy chain and the 1^(st) PCR product of the light chains,to obtain the correct 5′ sequence of the V-region. The correctedV-region is used to generate full-length antibodies for biologicalactivity assays.

Example 3 Discovery of Heavy and Light Chains that Bind a Human Cytokine

Using the methods and compositions described above human heavy and lightchain genes encoding two different monoclonal antibodies against a humancytokine were identified. Both of these monoclonal antibodies exhibitedhigh binding affinity for the antigen (in the nanomolar to picomolarrange). This demonstrates that the methods and compositions of theinvention are useful for identifying low frequency (in this case theantibodies were against a self antigen) antibodies from a human genome.

Other Embodiments

It is to be understood that while the invention has been described inconjunction with the detailed description thereof, the foregoingdescription is intended to illustrate and not limit the scope of theinvention, which is defined by the scope of the appended claims. Otheraspects, advantages, and modifications are within the scope of thefollowing claims.

What is claimed is:
 1. A purified monoclonal antibody comprising avariable region amino acid sequence that promotes binding of theantibody to a target antigen and has somatic hypermutations in its firstN-terminal 8 amino acids, wherein the variable region amino acidsequence is determined by screening a library of human immunoglobulinnucleotide sequences made by a method comprising the steps of: (a) usinga first PCR to amplify human immunoglobulin heavy and light chainnucleotide sequences from a cDNA library made from a sample of humanperipheral blood mononuclear cells obtained from a human donorpreviously determined to possess antibodies specific for target antigensof interest, wherein leader-specific forward primers that amplify atleast 90% of the human immunoglobulin V-region genes and reverse primersfor amplifying nucleic acid sequences encoding hinge or constant regionsare used in the first PCR amplification; (b) isolating the amplificationproducts from the first PCR step; (c) creating a first set of constructsby inserting the isolated amplification products from the first PCR stepinto expression vectors; and (d) introducing the first set of constructsinto a first set of host cells to create the library of humanimmunoglobulin nucleotide sequences.
 2. The purified monoclonal antibodyof claim 1, wherein the leader-specific primers comprise a set of ten ormore different polynucleotides each of which binds under stringenthybridization conditions to the complement of a different nucleic acidsequence selected from the group consisting of SEQ ID NOs: 1-76.
 3. Thepurified monoclonal antibody of claim 1, wherein the antibody comprisesan immunoglobulin heavy chain having somatic hypermutations in the firstN-terminal 8 amino acids of its variable region covalently linked to animmunoglobulin light chain having somatic hypermutations in the firstN-terminal 8 amino acids of its variable region.